35 research outputs found

    Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

    Get PDF
    International audienceSpeaker variability is a well-known problem of state-of-the art Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combinationwith model adaptation techniques.This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-StructuredGaussian Mixture Model (GMM). A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age byunsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured).The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation

    Design, development and field evaluation of a Spanish into sign language translation system

    Get PDF
    This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE)

    Factors Associated with the Performance of a Blood-Based Interferon-γ Release Assay in Diagnosing Tuberculosis

    Get PDF
    Background: Indeterminate results are a recognised limitation of interferon-γ release assays (IGRA) in the diagnosis of latent tuberculosis (TB) infection (LTBI) and TB disease, especially in children. We investigated whether age and common co-morbidities were associated with IGRA performance in an unselected cohort of resettled refugees. Methods: A retrospective cross-sectional study of refugees presenting for their post-resettlement health assessment during 2006 and 2007. Refugees were investigated for prevalent infectious diseases, including TB, and for common nutritional deficiencies and haematological abnormalities as part of standard clinical screening protocols. Tuberculosis screening was performed by IGRA; QuantiFERON-TB Gold in 2006 and QuantiFERON-TBGold In-Tube in 2007. Results: Complete data were available on 1130 refugees, of whom 573 (51%) were children less than 17 years and 1041 (92%) were from sub-Saharan Africa. All individuals were HIV negative. A definitive IGRA result was obtained in 1004 (89%) refugees, 264 (26%) of which were positive; 256 (97%) had LTBI and 8 (3%) had TB disease. An indeterminate IGRA result was obtained in 126 (11%) refugees (all failed positive mitogen control). In multivariate analysis, younger age (linear OR = 0.93 [95% CI 0.91-0.95],

    Pain in elderly people with severe dementia: A systematic review of behavioural pain assessment tools

    Get PDF
    BACKGROUND: Pain is a common and major problem among nursing home residents. The prevalence of pain in elderly nursing home people is 40–80%, showing that they are at great risk of experiencing pain. Since assessment of pain is an important step towards the treatment of pain, there is a need for manageable, valid and reliable tools to assess pain in elderly people with dementia. METHODS: This systematic review identifies pain assessment scales for elderly people with severe dementia and evaluates the psychometric properties and clinical utility of these instruments. Relevant publications in English, German, French or Dutch, from 1988 to 2005, were identified by means of an extensive search strategy in Medline, Psychinfo and CINAHL, supplemented by screening citations and references. Quality judgement criteria were formulated and used to evaluate the psychometric aspects of the scales. RESULTS: Twenty-nine publications reporting on behavioural pain assessment instruments were selected for this review. Twelve observational pain assessment scales (DOLOPLUS2; ECPA; ECS; Observational Pain Behavior Tool; CNPI; PACSLAC; PAINAD; PADE; RaPID; Abbey Pain Scale; NOPPAIN; Pain assessment scale for use with cognitively impaired adults) were identified. Findings indicate that most observational scales are under development and show moderate psychometric qualities. CONCLUSION: Based on the psychometric qualities and criteria regarding sensitivity and clinical utility, we conclude that PACSLAC and DOLOPLUS2 are the most appropriate scales currently available. Further research should focus on improving these scales by further testing their validity, reliability and clinical utility

    Early Social Cognition: Alternatives to Implicit Mindreading

    Get PDF
    According to the BD-model of mindreading, we primarily understand others in terms of beliefs and desires. In this article we review a number of objections against explicit versions of the BD-model, and discuss the prospects of using its implicit counterpart as an explanatory model of early emerging socio-cognitive abilities. Focusing on recent findings on so-called ‘implicit’ false belief understanding, we put forward a number of considerations against the adoption of an implicit BD-model. Finally, we explore a different way to make sense of implicit false belief understanding in terms of keeping track of affordances

    Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

    Get PDF
    This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.Published versio

    Dealing with Loss of Synchronism in Multi-Band Continuous Speech Recognition Systems

    No full text

    Dealing With Loss of Synchronism in Multi-Band Continuous Speech Recognition Systems

    No full text
    Contribution à un ouvrage.In multi-band systems, the signal is decomposed into several frequency bands, which are processed separately. Then, the recombination part must compute a unique sen-tence from all these different solutions. The task is quite easy in isolated word recognition, each word ending at the same time, but it becomes more difficult in continuous speech recog-nition, where each band has a different segmentation. The problem here is to decide when the recombination should be done. Two major solutions have been tested: the first one in-troduces synchronism between the bands, and recombination is done when all the bands are synchronous. The second one leaves the sub-recognizers totally independent and tries to extract from their solutions a phonetic structure which will allow us to process the recombi-nation part. We will briefly present an example of the first solution, then we will focus on the algorithm we have developed for the second one
    corecore